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Abstract 

This paper studies a generalization of multi-pro ver interactive proofs in which a verifier interacts with 
two competing teams of pro vers: one team attempts to convince the verifier to accept while the other 
attempts to convince the verifier to reject. Each team consists of two provers who jointly implement a no- 
signaling strategy. No-signaling strategies are a curious class of joint strategy that cannot in general be 
implemented without communication between the provers, yet cannot be used as a black box to establish 
communication between them. Attention is restricted in this paper to two-turn interactions in which the 
verifier asks questions of each of the four provers and decides whether to accept or reject based on their 
responses. 

We prove that the complexity class of decision problems that admit two-turn interactive proofs with 
competing teams of no-signaling provers is a subset of PSPACE. This upper bound matches existing 
PSPACE lower bounds on the following two disparate and weaker classes of interactive proof: 

1 . Two-turn multi-prover interactive proofs with only one team of no-signaling provers. 

2. Two-turn competing-prover interactive proofs with only one prover per team. 

Our result implies that the complexity of these two models is unchanged by the addition of a second 
competing team of no-signaling provers in the first case and by the addition of a second no-signaling 
prover to each team in the second case. Moreover, our result unifies and subsumes prior PSPACE upper 
bounds on these classes. 

1 Introduction 

Interactive proofs were introduced in the mid-1980's as a generalization of the concept of efficient proof 
verification and the complexity class NP IIBab85[ |BM88[ IGMR89I . Informally speaking, an interactive 
proof is a conversation between a randomized polynomial-time verifier and a computationally unbounded 
prover regarding some common input string x. A decision problem L is said to admit an interactive proof 
if there exists a verifier such that (i) if x is a yes-instance of L then there is a prover who can convince the 
verifier to accept x with high probability, and (ii) if a; is a no-instance of L then no prover can convince 
the verifier to accept x except with small probability. In a dramatic testament to the surprising power of 
randomization and interaction, it was soon discovered that every problem in PSPACE admits an interactive 
proof, yielding the well-known identity IP = PSPACE IILFKN92llSha92ll . 



1 



Multi-prover interactive proofs, no-signaling provers 

The fruitful study of interactive proofs has prompted further generalization of the model. One such gener- 
alization is the multi-prover interactive proof model of Ben-Or et al. IIBOGKW881 wherein several provers 
cooperate in their attempt to convince the verifier to accept the input string x. The key aspect that sets 
this model apart from single-prover interactive proofs is the fact that the provers cannot communicate with 
one another during the protocol. Amazingly, this small distinction is enough to increase the power of the 
model from PSPACE all the way up to NEXP IIBFL911lFRS94i even when the interaction is restricted to 
only two turns with only two provers IIFL92II . In terms of complexity classes, the corresponding identity is 
MIP = NEXP. 

Intermediate classes of multi-prover interactive proofs are obtained by tinkering with the set of strategies 
available to the provers. Consider, for example, a joint strategy where the distribution of answers from 
one prover is independent of the question asked of the other prover — these are the no-signaling strategies. 
Clearly, such a strategy cannot be used in a black-box fashion by the provers to establish communication. 
At first glance it may seem that the no-signaling condition is equivalent to the standard definition of a multi- 
prover interactive proof. However, there exist no-signaling strategies that cannot be implemented without 
communication between the provers, suggesting that this model might be a nontrivial intermediary between 
single- and multi-prover interactive proofs. 

Indeed, it was established by Ito, Kobayashi, and Matsumoto IIIKM09I that the two-turn, two-prover 
protocol for PSPACE of Cai, Condon, and Lipton IICCL941I is sound even against no-signaling provers. By 
contrast, PSPACE is known not to admit two-turn single-^xo\e.x interactive proofs unless the polynomial 
hierarchy collapses and PSPACE = AM IIBab851lGS89il . A converse result was proven by Ito, who showed 
that every problem that admits a two-turn interactive proof with two no-signaling provers is also in PSPACE 
UltolOII . Thus, the interactive proof model is even more sensitive to change than suggested by the difference 
between single- and multi-prover interactive proofs, as even the smaller difference between no-signaling 
and standard multi-prover interactive proofs is sufficient to make the jump from PSPACE up to NEXP (at 
least in the case of two turns and two provers). 

In addition to this prior work, parallel repetition results for multi-prover interactive proofs with no- 
signaling provers were established in Refs. | IHol09[ IKRIOI . The reader is referred to Ito UltolOII for more 
detailed history and references. 

Inspiration from quantum information 

Though the present paper contains no formal discussion of quantum information, it is proper to acknowledge 
its role in motivating the study of no-signaling provers. Interest in this model was originally drawn from 
the study of multi-prover quantum interactive proofs, in which the provers (and possibly the verifier) are 
permitted to exchange and manipulate quantum information. 

It is easy to see that interactive proofs with ordinary, "classical" provers are not affected by the ability 
of the provers to sample from a common source of randomness. Quantum provers, on the other hand, 
might use shared pieces of some entangled quantum state to implement a nonlocal strategy that correlates 
their messages in ways that cannot otherwise be achieved [Bel64|. (The phenomenon of nonlocality was 
famously branded by Einstein as "spooky action at a distance.") Indeed, some classical protocols which 
are sound against classical provers are known to become unsound when the provers share entanglement 
IICHTW04IICGJ09 1. 

Whereas the set of strategies that admit shared entanglement is highly complex, the set of no-signaling 
strategies is relatively simple and it includes entanglement-sharing strategies as a proper subset. So, for 
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example, any protocol that is sound against no-signaling provers is also sound against quantum provers 
who share entanglement. It is also interesting to find differences between no-signaling strategies and 
entanglement-sharing strategies, as this difference sheds light on the extent to which no-signaling can be 
used as a proxy for shared entanglement. In some protocols the allowance of arbitrary no-signaling strate- 
gies leads to implausible consequences llvD05[ IbBL+061 . Such protocols can be viewed as mathematical 
evidence against physical theories that admit so-called "super-strong" nonlocality such as that found in no- 
signaling strategies but not entanglement-sharing strategies. The present paper establishes a scenario in 
which two no-signalling provers are equivalent to two signaling provers. 

Interactive proofs with competing provers 

Another generalization of the single-prover model is an interactive proof with competing provers, in which 
one prover tries to convince the verifier to accept the input string x while the other prover tries to convince 
the verifier to reject x. One may consider proofs in which all messages are known to all provers {complete 
information) or in which each prover sees only the messages he exchanges with the verifier (incomplete 
information). These two forms of competing-prover interactive proofs were studied by several authors in 
the 1990's IIFST901 lFS92l IFKS951 lFK97l . But for our purpose in this paper it only makes sense to consider 
protocols with incomplete information. 

In the jargon of game theory, interactive proofs with competing provers are zero-sum games, about 
which there exists a vast body of literature in computer science, economics, and other disciplines. For 
instance, fast algorithms for zero-sum games of incomplete information in extensive form imply that the 
complexity class RG of problems that admit interactive proofs with competing provers is a subset of EXP 
||KM92 , KMvS94I . Feige and Kilian proved the reverse containment IIFK97I . yielding the competing-prover 
analogy RG = EXP of the aforementioned identity IP = PSPACE for single-prover interactive proofs. 

Feige and Kilian also studied two-turn interactive proofs with competing provers, providing a matching 
upper and lower bound of PSPACE on the complexity of this model IIFK97II . The complexity of fc-turn 
interactive proofs with competing provers for constants > 3 is an open question of interest to both com- 
plexity theorists and game theorists alike. 

Interactive proofs with competing teams of provers, our result 

Multi-prover interactive proofs and interactive proofs with competing provers are two distinct generaliza- 
tions of the single-prover model. The next logical step is to unify these two generalizations in the obvious 
way via interactive proofs with competing teams of provers. Combining established naming conventions 
for complexity classes based on interactive proofs, we let MRG denote the class of decision problems that 
admit interactive proofs with competing teams of provers. 

To the author's knowledge, this model was considered prior to the present work only by Feigenbaum, 
KoUer, and Shor [FKS95|. Those authors studied this class under the game-theoretic guise of zero-sum 
games of imperfect recall and proved the containments 

EXpNP c MRG c sf n 

where S2 and Ilf are classes in the second level of the exponential hierarchy, which is the exponential- 
time version of the familiar polynomial hierarchy. 

In this paper, we consider interactive proofs with competing teams of no-signaling provers. Our main 
result is as follows. 
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Theorem 1. Every decision problem that admits a two-turn interactive proof with competing teams of two 
no-signaling provers per team is also in PSPACE. 

This upper bound matches the aforementioned PSPACE lower bounds on the following two disparate 
and weaker classes of interactive proof: 

1. Two-turn multi-prover interactive proofs with only one team of no-signaling provers IICCL94I [IKM09II . 

2. Two-turn competing-prover interactive proofs with only one prover per team IIFK97II . 

Our result implies that the complexity of these two models is unchanged by the addition of a second com- 
peting team of no-signaling provers in the first case and by the addition of a second no-signaling prover to 
each team in the second case. Moreover, our result unifies and subsumes prior PSPACE upper bounds on 
these classes UtoTOllFKOTll . 

Limitations of the present approach 

Attention is restricted in this paper to interactions with no more than two no-signaling provers per team and 
no more than two messages exchanged with each prover. The purpose for this restriction, quite simply, is 
that this class of interactions appears to be the largest to which our techniques apply. 

For all we know, interactions with three messages for a prover or three provers on a team could be 
sufficiently powerful to capture all of EXP. Indeed, it is consistent with current knowledge that a three- 
message protocol for EXP might require only one prover per team, or that a three-prover no-signaling 
protocol for EXP might require only one team of provers. Given this paucity of upper bounds for similar, 
seemingly weaker models it is hoped that any reservation at the restrictions in our model is more than 
compensated by the fact that we are able to say anything at all about it. 

Let us list some natural extensions of the two-prover, two-turn model and point out exactly where our 
metchod fails for these extensions. 

More than two turns, only one prover per team. Perhaps the most important open problem related to our 
work is the complexity of A;-turn interactive proofs with competing provers for constants A; > 3. This 
problem, which dates back at least to 1997 IIFK97II . is still open even in the special case of only one 
prover per team. With only one prover per team, the question is really a game-theoretic question with 
a much wider application than just interactive proofs. 

Our method fails for this case because we do not have a bound on the verifier matrix of the form 
V < SAoiBoiP* such as that appearing in Proposition [3] Thus, we do not obtain a good enough bound 
on the loss vectors appearing in our variant of the multiplicative weights update method. 

More than two turns, only one team of no-signaling provers. The complexity of A: -turn multi-prover in- 
teractive proofs with two no-signaling provers is still open for k > 3, even with only one team of 
provers fltolOi For ordinary multi-prover interactive proofs — in which the provers are not allowed to 
implement arbitrary no-signaling strategies — it is known that a multi-turn protocol with any number 
of provers can be simulated by another protocol with only two turns and two provers IIFL92II . 

Our method fails here for the same reason as above — that we cannot bound the loss vectors in the 
multipUcative weights update method for a multi-turn verifier. 

More than two provers, only one team of no-signaling provers. Similarly, the complexity of two-turn multi- 
prover interactive proofs with more than two no-signaling provers is still open, even with only one 
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team of provers KtolOII . As mentioned above, ordinary multi-prover interactive proofs require only 
two provers IIFL92I . 

Our method does not extend to this case either, as there is no known analogue of Lemma [6] for more 
than two provers. 

Quantum verifier and/or provers. Even with two no-signaling provers, two turns of interaction, and only 
one team of provers, it is still not known that the PSPACE upper bound holds when either the verifier 
or provers can send quantum messages IlltolOII . Here the problem is that Lemma [6] does not hold for 
quantum states. 

Techniques 

Theorem [T] is proven by means of an efficient parallel algorithm that, given an explicit description of a 
verifier and an accuracy parameter 6, finds no-signaling strategies for the teams that are within 6 of optimal. 
Containment in PSPACE then follows in the usual way by observing that the description of the verifier has 
size exponential in the length of the input string x and then employing the fact that a parallel algorithm with 
succinct input can be simulated in polynomial space IIBor77l . 

Our algorithm is an example of the multiplicative weights update method (MWUM) as discussed in the 
survey paper liAHKOSI and in the PhD thesis of Kale IIKal07ll . (See also Ref. IIWK06II .) In its simplest form, 
the MWUM solves a min-max optimization problem on probability distributions. In the present paper we 
use the MWUM to optimize not just a single distribution, but many distributions simultaneously in the form 
of a stochastic matrix that represents a strategy for one of the teams. This trick seems to work only for 
two-turn protocols, as otherwise it is not clear how to ensure sufficient accuracy. 

Let us compare our algorithm to the two previous algorithms it subsumes: 

• The polynomial-space algorithm of Feige and Kilian for two-turn interactive proofs with compet- 
ing provers IIFK97II is a complicated and highly specialized precursor to the MWUM that, like our 
algorithm, optimizes over stochastic matrices that represent strategies for the provers. 

Their algorithm works by nondeterministically guessing the entries of the matrix and scanning them 
in a read-once fashion. This approach cannot be extended to optimize over no-signaling strategies, as 
the read-once model does not allow verification of the no-signaling condition. 

• The parallel algorithm of Ito for two-turn, two-prover interactive proofs with no-signaling provers 
KtolOII is essentially a reduction to the mixed packing and covering problem, which is a special type 
of linear program that is known to admit an efficient parallel algorithm HYouOlll . 

This approach, too, cannot be extended to competing teams of no-signaling provers, as any linear 
programming formulation of the protocol is unlikely to be a mixed packing and covering problem. 

Our study has benefitted from the valuable experience of recent applications of the MWUM to parallel 
algorithms for quantum complexity classes I JW09. JUW09..JJUW10. WulOllGWlll . Indeed, we follow the 
same high-level approach as the recent proof of DQIP = DIP = PSPACE IIGWlIi Namely, 

• The domain of admissible (no-signaling) strategies is a strict subset of the "natural" domain (stochastic 
matrices) for the MWUM. 

• To get around this problem, the strategy domain is extended to all the stochastic matrices and a 
penalty term is introduced so as to remove any incentive for a team to use an inadmissible strategy. 
(See Section O. 
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Figure 1 : A two-turn interactive proof with competing teams of two no-signaling provers per team. 

• Finally, one must prove a "rounding" theorem (Corollary 14. lb . which establishes that near-optimal, 
fully-admissible strategies can be obtained from near-optimal strategies in the extended domain with 
penalty term. 

2 Preliminaries 

2.1 Definition of two-turn interactive proofs with competing teams of provers 

In this paper we are concerned with decision problems that admit two-turn interactive proofs with competing 
teams of no-signaling provers. Let us clarify this concept. A two-turn verifier is a randomized polynomial- 
time algorithm that, given an input string x, produces questions i, j for the two teams of provers. The teams 
select their answers k, I (possibly using randomness to do so) and then the verifier accepts or rejects the 
input X according to some boolean function of i,j,k,l. For convenience, the teams shall be called Team 
Alice and Team Bob. It is the goal of Team Alice to convince the verifier to accept the input string x, while 
Team Bob's goal is to convince the verifier to reject x. 

In the protocols we consider each team consists of two provers. The provers of Team Alice shall be 
called Aliceo and Alicei, while the provers of Team Bob shall be called Bob^ and Bobi. Each individual 
prover on each team receives his or her own private question and supplies his or her own separate answer to 
the verifier. In particular, the question i asked of Team Alice is actually a pair i = {io,ii) with question 
going to prover AlicCc for both values of the bit c G {0, 1}. Similarly, the question j asked of Team Bob 
is also a pair j = (jo)ii) with question jc going to prover Bobc. The answers k, I received from the two 
teams are also pairs k = {ko, ki) and I = {lo,h) with answers kc and Ic coming from AlicCc and Bobc, 
respectively. The entire interaction is illustrated in Figured] 

Each team may jointly implement any no-signaling strategy in order to produce its answers. Briefly, a 
strategy for, say. Team Alice is no-signaling if the marginal distribution on answers ko from Alicco does not 
depend upon the question ii asked of Alicci and vice versa. No-signaling strategies are discussed in greater 
detail in Section 1231 

A decision problem L is said to admit a two-turn interactive proof with competing teams of no-signaling 
provers with completeness c and soundness s if there exists a fixed two-turn verifier with the following 
properties: 

Completeness. If the input string x is a yes-instance of L then there exists a no-signaling strategy for Team 
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Alice that convinces the verifier to accept x with probability at least c, regardless of the no-signaling 
strategy employed by Team Bob. 

Soundness. If the input string x is a no-instance of L then there exists a no-signaling strategy for Team Bob 
that convinces the verifier to reject x with probability at least 1 — s, regardless of the no-signaling 
strategy employed by Team Alice. 

The completeness and soundness parameters need not be fixed constants. Rather, they may vary as a function 
of the input string x. The complexity class MRGns(2, 2) consists of all decision problems that admit two- 
turn interactive proofs with competing teams of two no-signaling provers per team with completeness c and 
soundness s such that there exists a fixed polynomial-bounded function p on strings with c — s > l/p. 
(The first parameter of the class MRGns(2, 2) denotes the number of provers per team, the second denotes 
the number of turns in the protocol. It is also common to parameterize interactive proof classes according 
to the number of rounds of communication, rather than the number of turns. Under this scheme, the class 
MRGns(2, 2) might be called MRGns(2, 1) by some authors.) 

In this paper we prove MRGns(2, 2) C PSPACE. It then follows from existing lower bounds on weaker 
classes IJKM091 IFK^ that 

MRGns(2,2) = PSPACE. 



2.2 Notation, the Kronecker product 

To each interactive proof with input x we associate eight distinct finite-dimensional real Euclidean spaces — 
four question spaces and four answer spaces. These spaces are denoted as follows for both c G {0, 1}: 

Sc The question space for prover Alicec Ac The answer space for prover AlicCc 

Tc The question space for prover Bobc Be The answer space for prover Bobc 

The dimension of each space is the number of distinct questions or answers available to that prover. (For 
example, prover Aliceo can be asked any of dim((So) distinct questions and may respond with any of 
dim(^o) distinct answers.) Individual questions or answers are indexed by positive integers denoted for 
both c € {0, 1} as follows: 



Questions for AlicCc 
Questions for Bobc 
Answers from AlicCc 
Answers from Bob^. 



ic = 1, . . . ,dim(5c) 
jc = 1, . . . ,dim(7^) 
fee = 1, . . . ,dim(^c) 
lc = l,--. ,dim(^c) 



Since the verifier acts in polynomial time, the bit length of the questions and answers is at most a polynomial 
in the bit length |x| of the input string x. Since n bits suffice to encode 2" distinct questions or answers, the 
dimension of the spaces Sc,Tc,Ac,13c can be exponential in 

The Kronecker product (or tensor product) of two spaces X, y is another space with dimension dim(A') dim(3^). 
This product space is typically denoted by A' (g) 3^, which we abbreviate to Xy. Kronecker products involv- 
ing the eight spaces Sc,Tc,Ac,Bc are further abbreviated so that 



'01 



and so on. The Kronecker product extends in a natural way to vectors and linear operators. In this paper 
each vector or linear operator is implicitly associated with its representation as a column or a matrix, for 
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which the Kronecker product is given by a straightforward formula. For example, if A, S are 2 x 2 matrices 
given by 



A 



a b 
c d 



B 



p q 

r s 



then the Kronecker product B is given by 



A^B 



aB bB 
cB dB 



a 


P Q 


b 


P 


Q 








r s 




r 


s 






c 


p q 


d 


P 


Q 








r s 




r 


s 







ap aq bp bq 

ar as br bs 

cp cq dp dq 

cr cs dr ds 



This definition extends in the obvious way to arbitrary matrices of any dimension, including column vectors 
and other non-square matrices. 

We also make use of the following symbols: 



ex 

M* 

{A,B) 



<,> 



The all-ones vector of dimension dim(Af). 
The identity matrix acting on ?(!. 

The adjoint of a linear mapping M. If M is a matrix or column vector then M* is simply the 
transpose of M. 

The matrix inner product, defined as Tt{A*B). This inner product is defined only when the 
dimensions of A, B are equal. If A, B are vectors then {A, B) is called the vector inner product. 
Matrix inequalities are entrywise. 

Given a bit c G {0, 1}, the compUment c is given by c = 1 if c = 0, otherwise c = 0. 



2.3 Min-max formalism for interactive proofs with competing provers 

Given a fixed two-turn verifier and a fixed input string x, let TTjj denote the probability with which the 
verifier asks questions i = {io,ii) to Team Alice and j = (j'oi Ji) to Team Bob. For each 4-tuple (i, j) of 
questions to the provers let G ^oi'^oi denote the 0-1 vector of payouts to Team Bob. That is, for each 
k = {ko, ki) and each I = {Iq, li) the {k, Z)th entry of Vij is either zero or one according to whether the 
verifier accepts or rejects x in the event that the verifier asks questions to the teams and they respond 
with answers {k, Z).^'^ Consider the entrywise nonnegative matrix 

V : SoiToi ^ AoiBoi 

whose (i, j)th column is TTijVij. This matrix uniquely specifies the actions of the verifier. 

Strategies for the teams are specified as follows. For each pair i of questions let G ^oi denote the 
probability vector of Team Alice's responses to i. That is, for each pair k of answers the kth entry of 
denotes the probability with which Team Alice replies with answers k given that questions i were asked. 
Thus, the actions of Team Alice are uniquely specified by the stochastic matrix 

A:Soi^ Aoi 

'One could consider a more general referee in which the payouts are awarded probabilistically so that each entry of Vi,j lies in 
the interval [0, 1]. But it is easily seen that this model is equivalent to the one we have just described. 

^The payout vector Vij is defined so that indicates acceptance of x while 1 indicates rejection. This arbitrary choice is 
opposite of convention, but it better facilitates the forthcoming presentation of our multiplicative weights update algorithm. 
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whose ith column is Oj. Similarly, for each pair j of questions let bj G Bqi denote the probability vector of 
Team Bob's responses to j. The actions of Team Bob are uniquely specified by the stochastic matrix 

whose jth column is bj. Not every stochastic matrix denotes a valid no-signaling strategy for the teams. 
Criteria for no-signaling strategies are discussed in Section 1231 For now, it suffices to note that the set of all 
strategies available to each team is a compact convex subset of stochastic matrices. 

Conditioned on the verifier asking questions (i, j), it is clear that the probability of rejection is given by 
the vector inner product 

{vij,ai 0bj) . 

It follows that the probability of rejection — taken over all questions — given strategies A for Team 
Ahce and B for Team Bob is given by the matrix inner product 

Fr[V rejects x \ A, B] = {V, A B) = ^ vTjj {vij,ai (g) bj) . 

Of course. Team Bob wishes to maximize this quantity while Team Alice wishes to minimize this quan- 
tity. Given that the above inner product is bilinear in {A, B) and that the sets of admissible strategies for the 
two teams are compact and convex, it follows from standard min-max theorems IIVil381 IFan53ll that every 
interactive proof with verifier V has an equilibrium value, which we denote by \{V), given by 

\{V) = minmax(y, A<^ B) = maxmin(y, A® B) 
A B B A 

where the minimum is over all no-signaling matrices A : Sqi Aqi and the maximum is over all no- 
signaling matrices B : 7oi — > Bqi. In particular, for every protocol there exists at least one equilibrium 
point [A* , B*) with the property that 

{V, A* (g) B) < X{V) for all 5, 
{V, A(S)B*)> X{V) for all A. 

Thus, the strategy B* always ensures maximum likelihood of rejection, while A* always ensures minimum 
likelihood of rejection. 

This min-max theorem applies to every min-max expression considered throughout this paper. Hence- 
forth we do not bother to explicitly remark upon this fact. Here and throughout the paper we adopt the 
convention that for any min-max problem of the form 

i'{g) = minmax5(a, 6) 

aGA beB 

elements aGA and 6 G B are 5-optimal if 

g{a, b) < v{g) + 6 for all 6 G B, 
g{a, b) > v{g) — 5 for all a G A. 

Elements that ai^e 0-optimal — such as A* , B* above — are simply called optimal. 
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2.4 Notation for marginal distributions 

Before we discuss no-signaling strategies in detail it is beneficial to introduce notation for marginal prob- 
ability distributions that will be used throughout the remainder of this paper. Suppose, for instance, that 
a G Aqi is a probability vector of answers from Team Alice to some question from the verifier We let 
mar_4j (a) G denote the probability vector for the marginal distribution on answers from the prover 
AUceo- Basic probability theory dictates that the mapping mar^^ satisfy 

dim(^l) 

fcoth entry of mar^^ (a) = ^ {ko, ki)th entry of a. 

ki=l 

Of course, this mapping may be extended to arbitrary real vectors. For arbitrary spaces X, y the linear 
mapping mar^; is defined by 

mavy : Xy X : x®y ^ {ey, y)x. 

(The matrix representation of mar 3; i?,ey®Ix-) While this mapping is primarily intended to denote marginal 
probability distributions, we will have occasion to use it on non-probability vectors in this paper. 

The mapping mary is to vectors as the partial trace is to square matrices. Readers familiar with quantum 
information know that the state of a quantum register can be computed from a joint state of several registers 
via the partial trace. So too with probabiUty distributions: the distribution on states of a classical register 
can be computed from a joint distribution on states of several registers via mary. 

The mapping m.axy extends naturally from vectors to matrices by applying mary to each column: 

ith column of mary {A) = mary (zth column of A) . 
So, for example, if Team Alice acts according to the stochastic matrix A then the stochastic matrix 

mar^^ {A) : Sqi Aq 

describes the "marginal" strategy for prover AliceQ. That is, the (io,ii)th column of m.axj^^{A) is the 
distribution on answers ko from Aliceo given questions (zq, ii) from the verifier. 

2.5 Characterization of no-signaling strategies 

Recall that a strategy for Team Alice is no-signaling if for both values of the bit c € {0, 1} the marginal 
distribution on answers kc from Alices does not depend on the question asked of AlicCc- 

In terms of Team Alice's stochastic matrix A, this condition means that for each ic the (zq, ii)th column 
of mar_4_(yl) is identical for all subindices ic- Letting denote this fixed probability vector and letting 
Ac'. Ac denote the stochastic matrix whose columns are a,^, the above condition can be written as 

mar^^(yl) = ylc ® e^. 

We have just proven the following simple proposition. 

Proposition 2 (Characterization of no-signaling strategies). A stochastic matrix A : Sqi Aqi denotes 
a no-signaling strategy for Team Alice if and only if for both values of the bit c € {0, 1} there exists a 
stochastic matrix Ac : Sc ^ Ac such that 

mar^_(^) = Ac® e^. 
A similar characterization holds for Team Bob. 
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Stochastic matrices A meeting this condition are called no-signaling matrices. The matrices are said 
to witness the fact that A is a no-signaling matrix. It follows immediately from Proposition [2] that the set of 
all no-signaling strategies available to each team is compact and convex — a fact already used in Section 123] 
to assert the existence of optimal strategies for the teams. 

3 A relaxed min-max problem with penalties 

As mentioned in the introduction, the MWUM in its simplest form solves min-max optimization problems 
over probability vectors. We optimize over stochastic matrices for the teams by using the MWUM simulta- 
neously on each column of these matrices — a trick that works only for two-turn protocols, as we shall soon 
see. 

We noted in Section 1231 that the no-signaling matrices available to the teams form a strict subset of the 
stochastic matrices. In order to optimize only over no-signaling matrices, in this section we specify a new 
min-max optimization problem n{V) in which the teams may use arbitrary strategies but pay a penalty for 
strategies that violate the no-signaling condition. By a careful choice of penalty, we remove the incentive 
of the teams to select inadmissible strategies without ruining the precarious convergence properties of the 
MWUM. 

Some preliminary observations are given in Section [3?T] before the formal definition of the new min-max 
problem ^{V) in Section [l!2l Equivalence of fi{V) and X{V) is proven in Section [331 with proofs of some 
lemmas in Section l34l 

3.1 Bounds on two-turn verifiers 

First, for ease of notation we let <I>y denote the unique linear transformation satisfying 

{V,A(^B) = {^v{A),B) = {A,^*y{B)) 

for all matrices A, B. Though a precise formula for (^y is of little use in this paper, for completeness we 
note that 

^v{A) = T,s,A{A*<S^Ib,,)V) 
^*y{B) = Trr,A{lAn®B*)V) 

where Tr^^^ and Ttt^,^ denote partial trace transformations. At the risk of hijacking terminology from 
functional analysis, the matrix (^y{A) can be viewed as a partial inner product between V and A. This 
matrix can also be viewed as a new two-turn verifier for Team Bob obtained by "hard-wiring" Team Alice's 
strategy A into the original verifier V. 

Next, let p e cSoiToi denote the probability vector for the distribution on questions asked by the verifier. 
In the notation of Section [231 the (i, j)th entry of p is vTjj — the probability with which the verifier asks 
questions i to Team Alice and j to Team Bob. Let pAiice G ^oi denote the marginal distribution 

PAlice = marroi {p) 

on questions to Team Alice, so that the ith entry of pAiice is Ylj It is not hard to see that 
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with equality achieved in the extreme case that each of the verifier's payout vectors Vij is equal to the all- 
ones vector e^AoiBoi- (Recall that matrix inequalities are entry wise.) Similarly, it is easy to prove analogous 
inequalities for ^y{A), ^y{B). For example: 

Proposition 3. For any stochastic matrix B : 7oi — 601 it holds that ^y{B) < SAoiPAtwe- 

Proof. Let A : Sqi — > ^01 be any nonnegative matrix and let Oj, bj denote the columns of A, B, respectively. 
Then 

{A,^*y{B)) = {V,A(g)B)< {eAo,BoiP*,^(^B) = Y,-^t,j{eAoi,ai) {eB,n,bj) 
As B is stochastic it must be that (ee„^ , bj) = 1 for each j. The above expression then simplifies to 

As this inequahty holds for all nonnegative matrices A it must be that ^y{B) < GAoiPXnce claimed. □ 

3.2 Definition of the relaxed min-max problem 

The relaxation //(F) of A (F) is defined by 

fi{V)= min max {fv{A,Ao,Ai),{B,Uo,Ui)) 
{A,Ao,Ai) (B,no,ni) 

where the triples {A, Aq, Ai) and {B, Uq, IIi) have the form 



A 


Soi - 


Aoi 


any stochastic 




Ac 


Sc- 


>Ac 


any stochastic 


cG {0,1} 


B 


Toi- 


-^Boi 


no-signaling only 




n. 


^oi - 


-^Ac 


< He < e^^PAlice 


cG {0,1} 



The linear mapping /y appearing in the inner product (and its adjoint) is defined by 

fv : {A, Ao, Ai) ^ {^v{A) , mar^,(A) - Aq ej^ , mar^o(^) - Ai ® e^J 

: {B,Uo,Ui) ^ i'^viB) + CA, (8) Hq + e^^ (g) Hi , -Hq (/50 «) e^J , -Hi (/^^ e^o)) 

so that 

{fv{A,Ao,A,),{B,Uo,U,)) = {V,A0B)+ ^ (mar^^(A) - ® ej^, n^) 

ce{o,i} 

for all {A, Ao,Ai) and all {B, IIo, Hi). (The adjoint mapping fy is not used until the algorithm of Figure|2] 
and its proof of correctness in Proposition [8]) 
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Intuition 



Some explanation is in order. As with the original min-max problem X{V), the matrices A and B represent 
the sti^ategies employed by the teams. Note, however, that in the definition of fJ.{V) Team Alice is now 
free to choose among arbitrary stochastic matrices for its strategy. The matrices Aq,Ai for Team Alice are 
purported witnesses to the claim that A is a valid no-signaling matrix. 

For the moment, we are concerned with relaxing the domain only of Team Alice's strategies, so Bob's 
strategy B must still be no-signaling. Bob's strategies will be addressed in Section l42l The matrices Ho, Hi 
for Team Bob are penalty matrices — they are the means by which Team Bob penalizes Team Alice according 
to the extent that Aq, Ai are false witnesses to the claim that A is no-signaling. 

The new objective function {fv{A, ^i); {B, Ho, Hi)) equals the old objective function {V, A® B) 
plus two penalty terms. If A is not a no-signaling matrix then the difference matrix 

Ac = mar^_(^) - Ac e%_ 

must be nonzero for at least one c. In this case. Bob selects He to pick out the positive entries of Ac, which 
are then added the verifier's probability of rejection. 

Let us informally explain why the restriction < He < e■J!^^p\^^^^^ on penalty matrices is sufficient to 
remove Team Alice's incentive to cheat. Suppose the fccth entry of the ith column of the difference matrix 
Ac is a positive real number 5 > and suppose that A' is a valid no-signaling matrix witnessed by Aq,Ai. 
Since the verifier asks questions i of Team Alice with probability vTj, it must be that, when selecting the 
probability with which to answer kc, the advantage gained by Team Alice from using the inadmissible 
strategy A instead of the no-signaling strategy A' is at most Stti. By selecting a penalty matrix He so that the 
fccth entry of the ith column of He is equal to vTj, Team Bob adds precisely the quantity diTi to the verifier's 
probability of rejection, thus eliminating the advantage obtained by Team Alice in acting according to A 
instead of A' for this particular choice of questions i and answer kc from AlicCc. 

Repeating this logic for all entries (i, /cc) of Ac, we find that Team Bob should select the penalty matrix 
He so that the {i,kc)th entry is either zero or vTj according to whether the corresponding entry of Ac is 
nonpositive or positive. A penalty matrix of this form is called optimal for {A, Aq,Ai) and satisfies 

(Ac, He) = (A+ e^Xiice) 

where A+ is the positive part of Ac. (Here the positive part of a real matrix X is the matrix X'^ with the 
property that if x is any entry of X then the corresponding entry of X+ is max{0, x}.) 

3.3 Equivalence of the two min-max problems 

We are now ready to prove the desired "rounding theorem" mentioned in the introduction, a corollary of 
which is the equivalence of the min-max problems ^i{V) and X{V) (Corollary 14.11 ). The theorem employs 
two lemmas and their corollaries, the proofs of which appear below in Section [331 

Theorem 4 (Rounding theorem). Let (A, Aq,Ai) be a feasible solution for ^liV) and let IIj^, be optimal 
penalties for {A, Aq, Ai). There exists a no-signaling matrix witnessed by Aq,Ai such that for all 
stochastic matrices B it holds that 

ly,A^,®B) < {fv{A,Ao,Ai),{B,U^,Uf)). 

Moreover, A^s can be computed efficiently in parallel given [A, tIq, ^i)- 
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Proof. For both c £ {0, 1} let A+ be the positive part of mar^_(j4) — Ac® ej_ and observe that 

A+ < mar^_(^). 

By Corollary 15. ll below there exists a preimage D'^ > of Aj" with 

^ - Z)+ > 
mar^j(DfJ") = A^. 

Let be the positive part of mar_4(, (A — — j4i ej^^ . As with Ac above, observe that 

r+ < mar^„ {A - D+) . 

(Moreover, it is easy to see that < A'^ — a fact we employ later in this proof.) Apply Corollary l5. ll again 
to obtain a preimage > of r^'^ with 

A-D+ -C+ >0 
mar^o(Ci') = ^i- 

Thus, we have a matrix A — Dq — > such that for both c € {0, 1} it holds that 

mar^^ {A-D+- C+) < O ej^. 

Hence there exist nonnegative matrices Tc : 5oi — )• Ac with 

mar^^ {A - D+ - C7+) +Tc = Ac® e*^. 

Applying mar^^ to both sides of this equation we see that mar^Q(To) = mar_4^(ri). By Corollary 16.11 
below there exists a nonnegative matrix T : 5oi Aqi with mar^_(T) = Tc for both c G {0, 1}. The 
desired no-signahng matrix A^g is given by 

^ns = A — Dq — + T. 

As Dq, C^, and T can be computed efficiently in parallel, so too can Ans- To see that Ans is a no-signaling 
matrix witnessed by Aq, A\ it suffices to observe that 

mar^^(Ans) = mar^^ {A-D^ - C+) ^Tc = Ac® ej^. 

It remains only to verify the stated inequality. To this end, we have 

^ns ®B) = (A, ^*y[B)) - + C+, ^*y[B)) + (T, ^*y{B)) 

< {A,^*y{B)) + {T,^*y{B)) 

< {A,'^*y{B)) + {T,eA,nP*MiJ 

As Ans and A are both stochastic matrices, it must be that Dq + and T have the same column sums. As 
{T, Cy^oiPAiice) squals the sum of the column sums of T weighted according to pAiice> the matrix T can be 
replaced by Dq + Ci without affecting this inner product. That is 

{T, e^oi^'Alice) = (-^o" + e^oiPAlice)- 
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Expanding the right side of this equality we obtain 

(mar^,(D+),e^(,PAlice) + (mar^o(Ci+), e^ipXnce) = (^o , e^oPAlice) + {^t ^ (^AiPliicc) ■ 
As < A^i'" this quantity is at most 

^AiP Alice) ■ 

Putting everything together, we have 

= {A, <^>v{B)) + (mar^, (A) - Ao (E) ej^ , n^> + (mar^„ (A) - Ai O ej^, nf > 
= {fv{A,Ao,Ai),{B,ulut)). 

as desired. □ 
Corollary 4.1 (Equivalence of min-max problems). The following hold for any verifier V and any 6 > 0; 

1. f,{V) = XiV). 

2. IfiBi", n[;, n^*) is S-optlmalfor then B^' is 5-optimalfor \{V). 

3. If {A^^, Aq, A'^) is 5-optimal for ^(V) then there exists A^^ such that S-optimal for X{V) and 
A^s can be computed efficiently in parallel given {A^, Aq^A'^). 

Proof. We begin with item[T] It is easy to prove \{V) > fJ-iV): let be optimal for X{V), let Aq,Ai 
witness the fact that A^ is no-signaling, and let {B^^, IIq , nj*) be optimal for fJ-{V). Then 

x{v) > (v^A^^B^) = (/y(^^Ao,Al),(i?^^^;,^^)) >Mv^). 

For the reverse inequality, let {A'^,Aq,A'^) be optimal for fJ-{V), let H^'^ ,11^'^ be optimal penalties for 
{A^^ , Aq, A'^), and let B^ be optimal for X(y). By Theorem |4] there exists a no-signaling matrix A^g 
witnessed by , A'^ such that 

(v, ® B') < [fv {A^Al AO , nf , nf ) ) . 

The desired inequality \{V) < ij-{V) follows from the fact that the left side is at least \{V) and the right 
side is at most ^(V^). The proof of item[T]is complete. 

Item|2]follows easily from item[T] Let A be a no-signaling matrix and let Aq, Ai witness this fact. Then 

X{V) - 5 = fi{V) - 5 < {fv{A, Ao,Ai), n^, n^) = {V,A0 B^ . 

As A was chosen arbitrarily, it follows that i?^ is (5-optimal for X{V). 

For item |3j let B be any no-signaling matrix and let 11^^ , 11^'' be optimal penalties for the given 6- 
optimal solution {A'^,A'^, A'^). By Theorem |4] there exists a no-signaling matrix ylns witnessed by tIq , A'^ 
such that 

{V, A„s ®B)< {fv (A^ A^, AO , {B, nf , nf ) > < fi{V) + 6 = X{V) + 6. 
As B was chosen arbitrarily, it follows that A^s is 5-optimal for A(F). □ 
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3.4 Lemmas used in the rounding theorem 

The lemmas used in the proof of Theorem |4] are not difficult. It is quite likely that some form of these 
lemmas is part of computer science "folklore," though our notation may be nonstandard. 

Lemma 5 (Small marginals have small preimages). Let a G ^oi 6 ^ Aq be nonnegative vectors with 
6 < mar^^ (o). There exists a nonnegative vector d € Aqi with d < a and mar_4^ (d) = S. Moreover, d can 
be computed ejficiently in parallel given a, 6. 

Proof. Let a(fcg and 6k^j denote the nonnegative entries of a and 6, respectively. Let Sfcy denote the fcoth 
entry of mar_4^ (a) so that 

dim(^i) 

■^^0 = ^(fco.fci)- 

fci=l 

The desired vector d has entries fc^) given by 

_ . 4o -^^^^ when Sko 7^ 
"(fco,fci) ~ \ ^''o 

otherwise 

(Intuitively, the weight 6kQ required of J2ki ^{koM) "spread out" over each proportionately ac- 

cording to a(fcQ fc^).) It is clear that this construction can be implemented efficiently in parallel. 

Let us verify that d < a. Observe that for the case 7^ the ratio 6kg /s kg is at most one because 
6 < mar_4^(a). Then 

"(fco,fci) - a(fco,fci)— S a(feo,fci) 

ifco 

as desired. Of course, if Skg = then d^^^oM) = by definition and hence d^^^oM) — ^{koM) because a > 0. 
Let us verify that mar^^ (d) = 6. For the case Skg ^ the koth entry of mar^^ (d) is given by 

dim{Ai) ^ dim(Ai) 

X] ^(koM) = 7^ Yl HkoM) = 4o 
fci=l ki=l 

as desired. As above, if s^g = then by definition d(^i.o,ki) = for each ki and hence ^^.^ c^(A:o,A:i) = 0. As 
< Skg < Skg it must be that 5kg = 0, too. □ 

Corollary 5.1. Let A : 5oi — )• ^01 '^nt/ A : Sqi — ?> Aq be nonnegative matrices with A < mar^j(^). 
There exists a nonnegative matrix D : Sqi — > ^01 with D < A and mar^^(D) = A. Moreover, D can be 
computed efficiently in parallel given ^, A. 

Proof. Apply Lemma [5] to each of the columns of A, A. □ 

Lemma 6 (Disjoint marginals are always consistent). For both c G {0, 1} let tc € Ac be nonnegative 
vectors whose entries sum to the same value. There exists a nonnegative vector t € Aqi with mar_4_(i) = tc 
for both c G {0, 1}. Moreover, t can be computed efficiently in parallel given tQ,ti. 
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Proof. Let and q^^ be the nonnegative entries of to and ti, respectively. Let s denote the sum of the 
entries of to, *i so that 

dim(.4o) dim(^i) 

S= ^ Pko= 9fci- 
fco=l fcl=l 

If s = then it is clear that the desired vector t is the zero vector. For the remainder of the proof assume 
that s ^ 0. The desired vector t has entries t(fco,fci) given by 

It is clear that this construction can be implemented efficiently in parallel. 

Let us verify that mar_4_(t) = tc for both c G {0, 1}. For the case c = the fcoth entry of mar_4j (t) is 
given by 

dim(^l) 

V- Pfcpgfci _ PijS _ 

/ J ~ ~ Pko 

s s 

fcl=l 

as desired. The case c = 1 is handled similarly. □ 

Corollary 6.1. For both c € {0,1} let Tc : Sqi —?■ Ac be nonnegative matrices with mar_4(j(ro) = 
mar^j(Ti). There exists a nonnegative matrix T : Sqi — > ^oi "^it^^ iiicir_4_(r) = Tcfor both c G {0, 1}. 
Moreover, T can be computed efficiently in parallel given Tq,Ti. 

Proof. Apply Lemma |6] to each of the columns of To, Ti. □ 



4 A parallel multiplicative weights algorithm 

In this section we complete the proof of our main result — that every decision problem that admits a two- 
turn interactive proof with competing teams of no-signaling provers is also in PSPACE. Most of the detail 
appears in Section 14.11 wherein we present an efficient parallel oracle-algorithm based on the MWUM that 
produces 5-optimal no-signaling strategies for the teams, given an oracle for "best responses" for Team Bob 
to a given candidate strategy for Alice. We describe an efficient parallel implementation of the required 
oracle in Section 1421 from which the unconditional efficiency of our algorithm immediately follows. The 
ensuing inclusion of MRGns(2, 2) inside PSPACE is discussed in Section 1431 



4.1 The parallel algorithm 

Precise statements of the problem solved by our algorithm and the oracle it requires are given below. All 
input numbers are written as rational numbers in binary. For matrix inputs, each entry is written explicitly. 

Problem 1 (Weak no-signaling equilibrium). 
Input: A verifier matrix V : 5oi7oi — > AqiBqi and an accuracy parameter 6 > 0. 
Oracle: Weak no-signaling optimization. (See Problem |2] below.) 
Output: 5-optimal no-signaling strategies A, B for the min-max problem X{V). 
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Problem 2 (Weak no-signaling optimization). 

Input: A verifier-Alice matrix S : 7oi Bqi and an accuracy parameter S > 0. 

Output: A (5-optimal no-signaling strategy B for Team Bob. (That is, a no-signaling matrix B such that 
{S, B) > {S, B) — 5 for all no-signaling matrices B.) 

Given Corollary 14. 1[ it suffices to find (5-optimal solutions {A, Aq, Ai) and (S,no,ni) for ijl{V) and 
then convert these solutions into (^-optimal strategies for \{V). This method is codified in the algorithm of 
Figure [21 

This algorithm is a straightforward modification of the standard multiplicative weights update method 
for equilibrium problems. The precise formulation of the MWUM used in this paper is stated as Theorem 
|7] Our statement of this theorem is somewhat nonstandard: the result is usually presented in the form of 
an algorithm, whereas our presentation is purely mathematical. However, a cursory examination of the 
literature — say. Kale's thesis IIKal07[ Chapter 2] — reveals that our mathematical formulation is equivalent 
to the more conventional algorithmic form. 

Theorem 7 (Multiplicative weights update method — see Ref. IIKal07[ Theorem 2]). Fix an e € (0, 1/2). 
Let m^, . . . be arbitrary D -dimensional "loss" vectors whose entries ■m\ lay in the interval [—a, a]. 
Let , . . . , be D-dimensional nonnegative "weight" vectors whose entries are given recursively via 

w} = l 
wf^' = wj (1 - eml) . 

Let p^, . . . be probability vectors obtained by normalizing each . . , vj^ . For all probability vectors 
p it holds that 




Note that Theorem |7] holds for all choices of loss vectors m^, . . . , rn^, including the case in which each 
m* is chosen adversarially based upon w^. This adaptive selection of loss vectors is typical in implementa- 
tions of the MWUM. 

Proposition 8. The oracle-algorithm presented in Figure^solves the weak no-signaling equilibrium prob- 
lem (Problem\I}. Assuming unit cost for the oracle, this algorithm can be implemented in parallel with run 
time bounded by a polynomial in 1/6 and log(dim(5oi7oi-4oi-Boi)). 

Proof. For each pair i = {io,ii) of questions let vTj denote the probability with which the verifier asks 
questions i to Team Alice. Let m* denote the ith column of M* for each t = I, . . . ,T. We argue that the 
entries of m* lay in the interval [0, SvTj]. To this end, observe that the loss matrix M* is defined in Figure |2] 
via the adjoint mapping fy as 

M* = ^*y{B') + CA, ® n* + CAo ® n* < 3eAo^P*Mice 

where the inequality follows immediately from the bound ^y{B) < eAoiPXuce °f Proposition [3] and the 
restriction He < eA^pXuce '^^ penalty matrices. The desired bound on the entries of m* follows from the 
observation that the ith column of ^GAoiPXnce vector whose entries are all equal to Syrj. 
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1. Lete = S/W and let T 



lii(dim(A)i)) 



Let [W^, Wq, Wi) denote the triple of all-ones matrices and let (A^, ylj, A|) denote the uniformly 
random strategy for Alice obtained by normahzing the columns of (W^, Wq, VF/) . 

2. Repeat for each t = 1, . . . ,T: 

(a) Compute optimal penalties Hq, 11* for {A^, Aq,A\) as described in Section [3^ Use the oracle 
for Problem |2] to obtain a (5/2-best response to the verifier- Alice matrix ^y{A*). 

(b) Compute the loss matrices (M*, M^, A/{) = [B\ ■ Exit the loop now if t = T. 

(c) Update the weight matrices according to the standard multiplicative weights update rule: 

/ \ 

iy*+\ = {W\ W*Q, Wl) M [W^, W^, Wl) -e {M\ M^, Ml*) 

I ' 

\ all-ones matrices / 

where M denotes the (entry wise) matrix Schur product. (See Theorem |7]) 

(d) Compute the updated triple (A*+^, A^^"^ ,A'{^^) of stochastic matrices for Team Alice by nor- 
mahzing the columns of (VF*+\ W^*+\ l^i*+^). 

3. Compute 

T T 

(i,io,ii) = ^Y.^A\AlA\) and (5,no,ni) = ^ ^^(s*, n* , n*) 

t=l t=l 

both of which are 5-optimal for n{V). Compute the no-signaling matrix A^^ from {A, Aq, Ai) as 
described in Corollary 14.11 

4. Return (Ans, B) as the 5-optimal strategies of Team Alice and Team Bob for \{V). 



Figure 2: Algorithm that finds 5-optimal solutions to the equilibrium problem \{V) for two-turn interactive 
proofs with competing teams of no-signaling provers (Problem [TJ. 
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Let a* denote the ith column of ^4* for t = 1 , . . . , T. It is clear that the construction of the probability 
vectors a* in terms of the loss vectors m* presented in Figure |2]obeys the condition of Theorem|7] It therefore 
follows that for any probability vector a € Aqi we have 



t=i \ t=i 

Summing these inequalities over all columns i we find that for any stochastic matrix A it holds that 

T / , T 

T 



t = l \ t = l 

A similar bound on the stochastic matrices Aq, A\ in terms of the loss matrices Mg, M* can be derived 
in much the same way. For completeness, let us make this argument explicit. For both c G {0, 1} and for 
each question ic let vTj^ denote the probability with which the referee asks question to AlicCc. Let m* 
denote the icth column of A/* for each t = 1, . . . ,T. We argue that the entries of m* lay in the interval 
[— TTi^, 0]. Recall the loss matrix M* is defined in Figure [2] via the adjoint mapping fy as 

M* = -n* {Is, ® es^) > -eA, mar5^(pAlice)* 

where the inequality follows immediately from the restriction lie < ^AcPaUcs penalty matrices. The 
desired bound on the entries of m* follows from the observation that the icth column of ca, mar5_(pAiice)* 
is the vector whose entries are all equal to vTj^ . 

As above, let a* denote the i^th column of A* for t = 1, . . . , T. It is clear that the construction of the 
probability vectors a* in terms of the loss vectors m* presented in Figure [2] obeys the condition of Theorem 
|7] It therefore follows that for any probability vector ac € Ac we have 

^ , ln(dim(^c)) 
+ n.Ae + 

Summing these inequahties over all columns ic we find that for any stochastic matrix Ac it holds that 

At this point we have derived three inequalities for three arbitrary stochastic matrices A, Aq,Ai. Sum- 
ming these inequalities and substituting (M*, Mq, M*) = fy{B^, IIq, 11^) and the choices of e, T listed in 
Figure [2] we find that for any triple (^,^01^1) of stochastic matrices it holds that 

I T I 1 ^ \ 

-^(/y(^S4,4),(i?Sn*,n*)>< //v-CA ^0,^1),^ E(i?*,n*,n*)\+ 5/2. d) 
t=i \ t=i I 

The remainder of this proof is a straightforward adaptation of Kale's analysis for the much simpler class 
of two-player zero-sum games in normal form IIKal07t Section 2.3.1]. We argue that the triples {A, Jlq, ^1) 
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and (B, flo, Hi) appearing Figure|2]are 5-optimal for n{V). Let us begin with the triple (A, Aq, Ai). Choose 
any {B, IIo, IIi) and let {A*, A^, A^) be optimal for //(F). We have 

' T \ 1 ^ 

-^/y(^*,A*,4),(i?,no,ni)) <-^(/y(A*,^*,4),(i?*,n*,n*)> + V2 



1 t=l I t=l 



< (^MA^A*o,At),^J2{B\ulu\)j +6 < f,iV) + 6 

as desired. (The first inequality is because each (i?*,nQ,n*) is a (5/2-best response to {A^ , Aq, A\); the 
second is Eq. ([T]).) 

To see that {B, Hq, Hi) is (5-optimal for ii{V), let {A, Aq, Ai) be any triple of stochastic matrices. We 
have 

/fv{A,Ao,A,),^Y.(^'^^oM))>^^{MA\Al,A\),{B\ul^^^^ 
\ t=i I t=i 

as desired. (The first inequality is Eq. ([T]); the second is because each (i?*, IIq, 11^) is a 5/2-best response to 
(A*, Aq, A\).) Finally, it follows from Corollary 14. 11 that A^^ and B are (5-optimal strategies for \{V). 

That the algorithm admits an efficient parallel implementation is straightforward. In each iteration com- 
putations of optimal penalties, the loss matrices (via fy), the multiplicative weights update rule, and nor- 
malization are all simple operations involving only addition and multiplication of individual rational entries 
of matrices that can easily be implemented in parallel. Efficiency follows from the fact that the total number 
of iterations is bounded by a polynomial in 1/5 and the logarithm of dim(5oi7oi-Aoi'Soi), the size of the 
verifier matrix. □ 



4.2 Implementations of the best-response oracle for Team Bob 

In order for the algorithm of Figure |2] to be unconditionally efficient, we require a parallel implementation 
of the oracle for weak no-signaling optimization (Problem |2l). Fortunately, all the work is already done: 
Problem [2] is the optimization problem that arises naturally from two-turn, two-prover interactive proofs 
with no-signaling provers. Thus, the parallel algorithm of Ito UltolOl can be re-used to implement the oracle 
in our algorithm without complication. 

In Ito's terminology, the verifier- Alice matrix <I>y (A) specifies a game and the two no-signaling provers 
comprising Team Bob are the players. Ito does not claim that an explicit strategy for the players can be 
found efficiently in parallel. Rather, he claims only that the task of distinguishing high success probability 
from low success probability admits a parallel algorithm, as this simpler task is sufficient to put MIPns(2, 2) 
inside PSPACE. However, a cursory glance at the details of Ito's proof reveals a parallel construction of 
near-optimal no-signaling strategies for the players as required by Problem [2l 

Alternatively, the oracle for weak no-signaling optimization (Problem^ can be implemented by re-using 
the algorithm for weak no-signaling equilibrium (ProblemlHl listed in Figure|2]of the present paper. Indeed, 
Problem |2] is a special case of Problem [T] in which one team has a trivial strategy space. In this special case 
the required "oracle" demands only weak no-signaling optimization over a trivial strategy space, which of 
course admits a trivial parallel implementation. In other words, the algorithm of Figure |2] can be used in a 
two-level recursive fashion to give an unconditionally efficient parallel algorithm for Problem [U 
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4.3 Containment in PSPACE 



The desired containment of MRGns(2, 2) inside PSPACE now follows in the usual way: 

Theorem [ij Every decision problem that admits a two-turn interactive proof with competing teams of two 
no-signaling provers per team is also in PSPACE. Thus, we obtain the identity MRGns(2, 2) = PSPACE. 

Proof. Let L be a decision problem in MRGns(2,2) with completeness c and soundness s and let x be 
any input string. Each entry of the exponential-size verifier matrix V : 5oi7oi — AqiBqi induced by the 
verifier on input x can be computed in space polynomial in \x\ by simulating every choice of randomness 
for the verifier. In order to decide whether a; is a yes-instance or no-instance of L it suffices to find (5-optimal 
strategies for the teams for 6 = {c — s)/3, which permits us to distinguish X{V) > c from X{V) < s. 
It follows from Proposition [8] and the discussion in Section 14^2] that the algorithm of Figure |2] can be used 
to find J-optimal strategies for the teams and can be implemented in parallel with run time bounded by a 
polynomial in 1/5 and the logarithm of the dimensions of V. As the dimensions of V scale exponentially 
with \x\ and 6 scales as an inverse polynomial in \x\ the total run time of this parallel algorithm scales 
polynomially with |x| and can therefore be simulated in polynomial space in the usual way IIBor77L □ 
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